Network-based methods for psychometric data of eating disorders: A systematic review

Background Network science represents a powerful and increasingly promising method for studying complex real-world problems. In the last decade, it has been applied to psychometric data in the attempt to explain psychopathologies as complex systems of causally interconnected symptoms. One category of mental disorders, relevant for their severity, incidence and multifaceted structure, is that of eating disorders (EDs), serious disturbances that negatively affect a person’s eating behavior. Aims We aimed to review the corpus of psychometric network analysis methods by scrutinizing a large sample of network-based studies that exploit psychometric data related to EDs. A particular focus is given to the description of the methodologies for network estimation, network description and network stability analysis providing also a review of the statistical software packages currently used to carry out each phase of the network estimation and analysis workflow. Moreover, we try to highlight aspects with potential clinical impact such as core symptoms, influences of external factors, comorbidities, and related changes in network structure and connectivity across both time and subpopulations. Methods A systematic search was conducted (February 2022) on three different literature databases to identify 57 relevant research articles. The exclusion criteria comprehended studies not based on psychometric data, studies not using network analysis, studies with different aims or not focused on ED, and review articles. Results Almost all the selected 57 papers employed the same analytical procedures implemented in a collection of R packages specifically designed for psychometric network analysis and are mostly based on cross-sectional data retrieved from structured psychometric questionnaires, with just few exemptions of panel data. Most of them used the same techniques for all phases of their analysis. In particular, a pervasive use of the Gaussian Graphical Model with LASSO regularization was registered for in network estimation step. Among the clinically relevant results, we can include the fact that all papers found strong symptom interconnections between specific and nonspecific ED symptoms, suggesting that both types should therefore be addressed by clinical treatment. Conclusions We here presented the largest and most comprehensive review to date about psychometric network analysis methods. Although these methods still need solid validation in the clinical setting, they have already been able to show many strengths and important results, as well as great potentials and perspectives, which have been analyzed here to provide suggestions on their use and their possible improvement.


Methods
A systematic search was conducted (February 2022) on three different literature databases to identify 57 relevant research articles. The exclusion criteria comprehended studies not based on psychometric data, studies not using network analysis, studies with different aims or not focused on ED, and review articles.

Results
Almost all the selected 57 papers employed the same analytical procedures implemented in a collection of R packages specifically designed for psychometric network analysis and are mostly based on cross-sectional data retrieved from structured psychometric questionnaires, with just few exemptions of panel data. Most of them used the same techniques for

Introduction
In the present work, we aim to describe the current state-of-the-art of the network conceptualization of psychometric data of a specific psychopathology, namely eating disorders (EDs) via a systematic review of the methods presented in literature carried out following the PRISMA guidelines (Fig 1; [1]). EDs are severe psychiatric syndromes defined by abnormal eating behaviors that negatively affect a person's physical or mental health [2]. They are believed to result from and be sustained by sociocultural, psychological, and biological factors. Anorexia nervosa (AN), bulimia nervosa (BN), and binge eating disorder (BED) are the primary diagnoses associated with ED. In the last century, the paradigm that best got ahead in Western medicine has been the "disease model" [3], according to which all symptoms a person exhibits result from a common cause or latent entity (namely, the underlying disease) that should therefore be targeted by an effective treatment to obtain, as a consequence, the lessening of all the deriving symptoms [4,5].
Unfortunately, in contrast with general medicine, in most mental disorders the identification of common pathogenic pathways has proven elusive [3,[5][6][7], given that they cannot be diagnosed independently of their symptoms [3]. Therefore, the need of conceptualizing in an alternative way the relation between symptoms and disorders arose in the twenty-first century and led to the delineation of the network theory of psychopathology, an innovative approach that inspired an exponentially increasing number of empirical publications in the past two decades, especially after the seminal article by Borsboom and Cramer [3] was published. Differently from the disease model, in the network model, symptoms are conceptualized as mutually interacting and reciprocally reinforcing elements of a complex network, i.e., causally active components of the mental disorder instead of passive receptors of its causal influence (see Fig 2).

The network approach to psychopathology: A theoretical framework
The central idea behind the network approach to psychopathology is that mental disorders arise from casual interactions between symptoms, where causality must be interpreted in the sense of the interventionist theory, according to which the relation between two symptoms is causal if there exists a possible (natural or experimental) intervention on one of them that psychometric variables usually consist of responses to questionnaire items, symptom ratings and cognitive test scores together with other possible personal or psychological indicators [18].
Once enough data are available, the network estimation step can be carried out with the aim of approximating the values of links between pairs of nodes (i.e., the causal influence of one onto the other) and building an appropriate network structure at the system level. Depending on the peculiarities of the data, different statistical methods can be employed: the most frequent approach is that of assessing the edge parameters as conditional associations between variables to estimate the corresponding Pairwise Markov Random Field (PMRF), but the alternative strategy of Bayesian network estimation has been successfully employed as well [5]. Importantly, this step also encompasses the process of node selection and edge selection, the latter via general statistical methods such as fit indices, null-hypothesis testing, or cross-validation procedures.
The result of this step is generally a nontrivial topological structure which becomes the main subject of the network description phase, whose aim is to give a complete characterization of the symptom network with a particular focus on its most important nodes. Here, "importance" has to be intended as how a node is interconnected with the other nodes of the network and is commonly assessed by different centrality measures [5], that is, scalar values assigned to each node within a graph in order to assess their significance based on certain definitions of importance. In general, the tools of network analysis are employed to estimate network density and connectivity through global topological properties, node centrality through local topological properties and more fine-grained structural patterns such as communities and motifs (i.e., mesoscopic level [21]).
Next, it is fundamental to evaluate the stability and robustness of the estimated network and of the centrality indices. In fact, the estimation error and the sampling variation need to be considered in order to not obtain misleading results [22][23][24][25][26]. Altogether, the methods used to assess the accuracy of the estimated parameters and their ability to replicate in a different dataset constitute the network stability analysis [25].
Finally, the psychometric network analysis approach comes to an end with proper inferences which require taking into account both substantive domain knowledge and methodological considerations about the stability and robustness of the estimated network [18].
The clinical implications of such findings concern both the diagnosis and treatment systems. With respect to diagnosis, the network approach suggests clinicians to follow a two-step process according to which they should firstly identify the exhibited symptoms and then the network interactions that sustain them, which however is actually not very different from the current DSM diagnostic practice [6]; the novelty introduced by this approach is that it can help recognize the most important symptoms as well as other psychological traits that are not included in the DSM criteria [27], such as feeling ineffective and social insecurity in case of EDs. As for the treatment of the diagnosed disorder, the network perspective suggests clinicians to apply techniques that can change or manipulate the network, in particular that are capable of: changing the state of one or more symptoms, modifying the external field by removing triggering events or manipulating the network structure by altering the connections among symptoms [6] Aim of the systematic review The aim of this study is to collect and review the existing literature on psychometric network analysis of EDs. To the best of our knowledge, three other reviews centered on the application of this approach to EDs have already been published between 2018 and 2021 [28][29][30]. As already pointed out in [30], although giving important insights into the framework and potentialities of the network research on EDs, Levinson et al., 2018 [28] and Smith et al., 2018 [29] suffer from the limitation of reviewing only a restricted number of studies (3 and 5, respectively). On the other hand, the most recent review paper [30] examines a wider range of articles (i.e., 25) and precisely discusses their results, but leaves out an accurate analysis of the methodologies employed by the researchers. In addition, as already pointed out by the authors, a noteworthy limitation of this latter review is that no study based on longitudinal data was taken into consideration. With this systematic review, we intend to update and broaden the results of such seminal reviews with an even larger and wide-ranging sample of studies, in particular by focusing on the potentialities and limitations of the available methodologies in the field of psychometric network analysis.

Methods
To ensure a standardized review procedure, the Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) 2020 statement [1] was followed. No protocol was apriori registered. As this is a systematic review of published literature, ethical approval was not sought.
We also followed guidelines derived by tools such as ROBIS (available at https://www. bristol.ac.uk/population-health-sciences/projects/robis/robis-tool/), aimed at evaluating and identifying concerns with the review process and at judging risks of bias. We carefully assessed the whole review process, i.e., each of the following steps in a sequential manner: study eligibility criteria; identification and selection of studies; data/method collection and study evaluation; synthesis and new findings. All steps were judged to have low or no concern, and consequently the review was assessed as having a low risk of bias.
The articles included in this study were extracted from the databases Scopus, PubMed and PsycInfo in February 2022 by means of a query aimed at retrieving all published articles containing the simultaneous occurrence of terms related to network analysis and EDs; more precisely, we searched for: ("eating disorder � OR "anorexia" OR "bulimia") AND "network analysis" AND NOT "social network" in either the title, abstract or keywords of the articles indexed in the selected databases. Searching the three databases yielded a set of 235 articles, including 135 duplicates, and was further narrowed down to a number of 57 papers by considering the following exclusion criteria: (1) studies not based on psychometric data, (2) studies with aims different from the investigation of EDs, and (3) review articles. All detailed information about each included article is reported in Table 1. The PRISMA flow chart corresponding to the present methodology is shown in Fig 1.

Results
In the following, we analyzed a large sample of network-based studies that exploit psychometric data related to ED. Specifically, we first introduce each step of the (general) psychometric network analysis workflow and then describe and compare the results reported in the articles under review.

Research question
In line with the application to other mental disorders, various research goals can be identified among the existing literature about network approaches to EDs, namely:

Collection of psychometric data
The accomplishment of the above research goals relies on the successful collection of datasets having specific peculiarities, since this will determine the possibility of estimating certain types of networks. The most typical starting point for this kind of analysis is clearly the selection of appropriate psychometric assessment tools, mainly self-report questionnaires and structured clinical interviews [5]. Depending on the sample size and the sampling frequency, three types of data environments can be identified among the current practice of network approaches to psychopathology, namely cross-sectional data, time-series data, and panel or longitudinal data. Cross-sectional data has been the first type of data used in this field and is definitely the most mentioned across the existing literature (e.g., it was used in 48 papers out of 57 in our sample). It is particularly suitable for the estimation of group-level networks, since it provides variable measures taken at a single time point in a large sample. Importantly, the associations between variables are built upon differences among individuals and for such a reason a lot of caution should be taken when inferences about single patients are made, since the very strict conditions under which a structure of intraindividual variation can be deduced from the analogous structure of interindividual variation are seldom met in psychological processes [100,101]. In a cross-sectional dataset, rows can be reasonably assumed to be independent, therefore the corresponding PMRF can be directly estimated from it. Time-series and panel data have been introduced in the psychometric network modeling in order to address two main limitations of cross-sectional data: the lack of clear understanding of individual networks and the inability to capture the dynamic features of psychopathology [5]. Both data environments are characterized by datasets where variables are measured at multiple time points, with the difference that time-series data focuses on one single individual, whereas panel data consist of observations of multiple individuals. Given a time-series dataset, one can estimate two different structures: a directed temporal network of vector autoregressive coefficients where links describe associations between variables through time, and an undirected contemporaneous network where links describe instead the association between variables after the temporal effects have been removed. In case of panel data, a third structure can be estimated, namely a between-subject network, where links indicate the conditional associations between the long-term averages of the time series between people [18].
In line with other experimental studies, the applications to EDs mostly move from crosssectional data. Nevertheless, few exceptions are worth mentioning. Firstly, Levinson and coworkers in three different papers [57-59] used panel data to estimate interindividual networks (temporal, contemporaneous, and between-subject), as well as intra-individual networks (temporal and contemporaneous) for some of the patients in the sample. Other relevant studies aimed at assessing the treatment efficacy by applying statistical techniques [34,36,38,44,51,52,66,79].
Data vary in terms of other features as well. Among the articles under review, 50 out of 57 described their sample as being composed of a great majority of female participants. After all, the fact that EDs are much more common in women than in men is broadly known and documented [102], with reasons usually attributed to social pressure [103], adolescent turbulence, poor body concept, and role confusion [104]. Just one study involved only male participants [47], while few other papers reported a more heterogeneous (mostly nonclinical) sample with male participants within the range of 40-60% [33,53,55,60,71,75,76].
Moreover, in 60% of the cases, participants are reported as clinical, either inpatients or outpatients. Among these, three studies involved users of the Recovery Record [105] smartphone application [38,39,71]. Exceptions consist in mixed samples involving nonclinical patients, such as school or college students, and three case studies based on datasets collected through the crowdsourcing marketplace Amazon Mechanical Turk, (MTurk; [47,55,60]).
Nearly all papers focus on the most common ED diagnosis, namely Anorexia Nervosa (AN) and Bulimia Nervosa (BN). However, some of them also present results concerning secondary EDs, in particular binge-eating disorder [52,85], and night eating disorders [32].
Various psychometric assessment questionnaires were used as tools for data collection. For the evaluation of ED specific symptoms, the most widely used tests were the Eating Disorder Inventory (EDI; [87,93,98], the Eating Disorder Examination Questionnaire (EDE-Q; [92]), and the Eating Pathology Symptom Inventory (EPSI; [90]). For the assessment of general psychological factors other tests were also used, for example the Symptom Check-List 90 (SCL-90; [106,107]), and the Beck Depression Inventory (BDI-II; [108]). Note that, among the cited psychometric tests, the only one that has been designed to assess both ED specific symptoms as well as other general integrative psychological constructs is EDI.

Methods for network estimation and reconstruction
Once the data has been collected, the next fundamental point is determining the variables of interest, that is, the nodes of the network. Instead of considering the totality of the items included in the questionnaires, a common practice is that of reducing their number in an effort to produce more accurate results by avoiding redundant (i.e., collinear) variables. The final nodes do not generally comprise all the items of the questionnaires. Instead, they are chosen in either one of the following ways, namely by taking into account just some special item aggregates such as questionnaires' subscales, by employing the goldbricker() function of the R package networktools [109], which compares correlations between variables and identifies the collinear ones, or by combining the latter with a further theoretically driven selection of items.

Cross-sectional networks
The types of networks that can be estimated depend on what kind of data is available. In case of cross-sectional data, the main solutions are association networks, concentration (or partial correlation) graphs, regularized partial correlation networks, and Bayesian networks, where the first three types are undirected, weighted and can all be estimated with the qgraph [110] R package, while the last one is direct, either weighted or unweighted, and can be obtained with the help of the bnlearn [111] R package.
Association networks are the most basic types of networks that can be estimated from crosssectional data. Edges correspond to zero-order correlations between symptoms, indicating the probability of their co-occurrence. For example, the qgraph() R function with input parameter graph = "cor" will compute an association network by estimating zero-order correlations between each pair of variables through the Pearson coefficient r.
Consider a set of p variables X = (X 1 ,. . .,X p ), each described by n observation, that is, Given paired data fðx ð1Þ i ; x ð1Þ j Þ; . . . ; ðx ðnÞ i ; x ðnÞ j Þg, the Pearson coefficient is defined as: Þ ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi X n k¼1 ðx ðkÞ i À � X i Þ 2 ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi X n where n is the sample size, x ðkÞ i and y ðkÞ i are two sample points indexed with k, and � X i ; � X j are the sample means of the variables X j and X j .
Association networks have two main limitations: first, they do not give any information about the direction of causal relationships, and second, they do not discern true relations from spurious ones and from those caused by the influence of other nodes [5].
Concentration networks solve the second of these limitations by estimating edges as partial correlations between symptoms after adjusting for the influence of all other nodes in the networks; only the edges whose value is above a fixed threshold are then kept. Formally, the partial correlation between two variables X and Y given a set of n controlling variables Z ={z 1 ,. . .,z n } is written as Corr(X i ,X j |Z) and is given by the correlation between the residuals e X and e Y resulting from the linear regression of X with Z and of Y with Z, respectively. A network where each edge corresponds to the partial correlation between the connected nodes can be estimated through the qgraph() function by setting the parameter graph = "concentration".
When dealing with p-multivariate data X ¼ ðX 1 ; . . . ; X p Þ � N ðm; SÞ all information needed to compute the partial correlation coefficients is encoded in the variance-covariance matrix S. In fact, once its inverse is defined (i.e., the so-called precision matrix K), one can directly apply the following relationship to recover the partial correlation coefficients: where k ij denotes the element in row i and column j of the precision matrix K = ∑ −1 and X −(i,j) denotes the set of variables without X i and X j . These coefficients can be graphically displayed in a weighted undirected network where each node corresponds to a variable and edges between them are given by the partial correlation coefficients. If the ij th component of ∑ −1 is zero, then the variables X i and X j are conditionally independent, given the other variables, and no edge will be traced between them. This model is a type of PMRF and it is called Gaussian Graphical Model, shortly GGM [24]. Forbush et al.
[45] gave an example of an ED symptom network estimated as an association graph and also proposed the corresponding concentration network in the supplementary material of the same paper. When the number of variables to estimate is high, it has been suggested that a more appropriate model to use is the regularized partial correlation network [112], obtainable running the gqraph() R function with input parameter graph ="glasso". The result is similar to a concentration graph in the fact that edges indicate partial correlation between nodes, however it has the relevant difference of including the implementation of an L1 regularization technique called graphical LASSO (least absolute shrinkage and selection operator; [113]) which allows for the shrinking of all small partial correlations to zero. This procedure returns a sparse network that parsimoniously accounts for the covariance among nodes, in the sense that only the edges that are most robust and most likely to reflect real associations are kept [5]. Formally, the graphical LASSO gives an estimation of the precision matrix K = ∑ −1 by solving the optimization problem of maximize the penalized log-likelihood max K flog det K À traceðSKÞ À ljjKjj 1 g over nonnegative definite matrices K, where S is the empirical covariance matrix of X, λ is a nonnegative tuning parameter and ||K|| 1 denotes the 1 -norm, that is, the sum of the absolute values of the elements of ∑ −1 . Hence, the higher the λ value, the more K ij will be set to zero.
Clearly, if λ is too low, then too many spurious edges risk being included (i.e., yielding a high number of false positives), whereas if λ is too high, then the risk is to remove relevant connections (false negatives). Hence, the λ parameter needs to be tuned. The best model (i.e., the most likely to maximize the number of "true" edges while minimizing the spurious ones) is then identified through the Extended Bayesian Information Criterion (EBIC; [114]) for multiple λ values and then choosing the model with lowest EBIC score. Let P be a subset of {1,. . .,p} and let υ = |P| be the cardinality of this subset, then the best λ is chosen to be the one that maximizes [114,115].
It has been suggested by Williams and Rast [116] that the graphical LASSO gives an accurate representation of data only when the number of nodes vastly exceeds the number of cases; if not, a non-regularized network should be preferred. Nevertheless, almost all of the articles under review based on cross-sectional data employ the graphical LASSO to estimate the symptom network.
Notably, in Monteleone et al. [66] the network structure of the cross-sectional data corresponding to each time period of the analysis is estimated via a Mixed Graphical Model (MGM), an extension of GGM that allows the integration of variables following two or more different data distribution [117]. In the specific case of this study, MGM was chosen in favor of GLASSO to include the treatment group as a binary variable into the model and thus assess its effect with respect to the treatment as usual (TAU) control group.
Bayesian networks attempt to discern causality by representing data as directed acyclic graphs (DAGs) where arrows indicate the direction of predictions and, possibly, causality [7]. DAGs depict the joint probability distribution of the variables and can thus be decomposed into the conditional distribution of each node given its parent. Importantly, dependence relations should not be confused with temporal antecedence, which cannot be derived from crosssectional data in any way. What restrains Bayesian networks from widespread application is the existence of strict assumptions about data that it is pretty difficult to find in psychological analysis settings according to clinical observations [5]. The first condition is that all relevant causal variables should be included in the system. Second, the causal Markov Condition [7] must be met. Third, the probability distributions of certain variables might not be unrestricted. And fourth, it might not be easy to choose the best model among all possible ones [7]. Moreover, one additional assumption is suggested by the definition itself of DAGs, namely that all loops of any length are prohibited. A Bayesian network can be estimated from multivariate data through the bnlearn R package [111].
In a paper that analyzes the relationship between EDs and childhood abuse, Rodgers et al. [73] followed this Bayesian approach in two steps: they first wrote down a "blacklist" of forbidden edges to limit the investigation to patterns of symptom relationships that made both conceptual and clinical sense; next, they estimated the DAG through the hill-climbing algorithm. This is an iterative machine-learning process that starts with an arbitrary network structure and tries to improve it by making incremental changes to the network (e.g., adding, removing, or reversing edges). At each step, the Bayesian Information Criterion (BIC) is computed, and the change is kept if and only if it results in a lower BIC. This process continues until no further improvements can be found and the network model that represents the best fit for the interactive structure of the ED psychopathology is returned.
Diverging from the other studies, Bronstein et al.
[33] employed a combined procedure to test different hypotheses. In particular, they used the exploratory causal discovery algorithm Greedy Fast Causal Inference (GFCI; [118]) to investigate potential causal pathways involving ED symptoms, biased and inflexible interpretations, and socioemotional functioning. GFCI takes as input a dataset of continuous variables (e.g., psychometric data) and outputs a graphical model called Partial Ancestral Graph (PAG; [119]), which is a representation of a set of Bayesian Networks that cannot be distinguished by the algorithm. Notably, GFCI is characterized by the ability of detecting latent confounders (i.e., an unmeasured variable that casually influences two or more measured variables); this information is conveyed by different edge types in the generated PAG.

Joint estimation of cross-sectional networks
In many of the studies under review, multiple networks were estimated with the specific aim of comparing their structure in various populations. As it will be later explained in more details, most often this task is accomplished by first estimating each network separately and only later some pivotal test statistics are computed to highlight global and local topological differences.
However, when the observations in a dataset consist of several distinct classes, it is also possible to adopt a recently developed technique called Joint Graphical Lasso (JGL), which allows for jointly estimating multiple graphical models corresponding to distinct but related conditions [120]. In particular, the JGL fits GGMs on data with the same variables observed on different classes but differs in the estimation of uncoupled GGMs from independent samples in the fact that, besides the lasso penalty on density, the regularized optimization problem for the evaluation of the precision matrices of each class includes an additional penalty term that is specifically written to foster similarity between groups.
Suppose we are given K datasets Y (1) ,. . .,Y (K) ,K�2, where each Y (k) is a n k ×p matrix consisting of n k observations on a set of p features common to all K datasets. We assume that all the observations X K k¼1 n k are independent and that, within each dataset, Y ðkÞ � N ðm k ; S k Þ. Then one can define the empirical covariance matrix for Y (k) as S ðkÞ ¼ 1 n k ðY ðkÞ Þ T Y ðkÞ . Danaher et al. [120] proposed to estimate the precision matrices (∑ (1) ) −1 ,. . .,(∑ (K) ) −1 by maximizing the penalized log-likelihood max y � X K k¼1 n k ðlog dety ðkÞ À traceðS ðkÞ y ðkÞ ÞÞ À PðfygÞ � where θ (1) ,. . .,θ (k) are assumed to be positive definite and P({θ}) denotes a convex penalty function chosen to encourage precision matrices to share certain characteristics (e.g., the locations or values of the nonzero elements or the sparsity).
Depending on the explicit definition of the penalty function, JGLs are classified as Fused Graphical Lasso (FGL) and Group Graphical Lasso (GGL). The former encourages shared edge values across classes, whereas the latter only encourages a similar pattern of sparsity across all precision matrices. Hence, the FGL results in a stronger form of similarity [120]. The penalty of the FGL has the form where λ 1 and λ 2 are nonnegative tuning parameters, the first controlling the' 1 -penalties applied to each off-diagonal element of the K precision matrices, the second those applied to the differences between corresponding elements of each pair of precision matrices. Hence, large values of λ 1 will increase the sparsity of the precision matrices (just like in graphical LASSO), while large values of λ 2 will cause many elements to be identical across classes. Both types of JGL have already been implemented in the R package EstimateGroupNetwork [121], which also include methods for the automatic tuning parameter selection. A final consideration about JGL is that its network estimations cannot behave worse than independent GGMs [120][121][122]. In fact, when the tuning procedure selects a value of the corresponding tuning parameter equal or very close to zero, independent GGMs are estimated via typical graphical lasso. Therefore, JGL cannot hide differences nor inflate similarities across groups.
The JGL has been applied in the EDs research as well. Schlegl et al. [77] used the FGL with k-fold cross-validation for parameter selection to estimate four different networks, one for each of the following groups: adolescents with AN, adults with AN, adolescents with BN, and adults with BN. Similarly, Smith et al. [78] used the FGL to estimate and compare distinct networks from samples corresponding to either of the following groups: outpatients without ED diagnosis, outpatients with a lifetime attempt of suicide, and people with a current ED diagnosis. Martini et al. [62] employed FGL to compare a sample of AN patients with a control group. Finally, [47] jointly estimated and compared the network structures of two nonclinical samples of men with and without core ED symptoms.

Longitudinal and personalized networks
Time-series data allow for the estimation of personalized networks of two different kinds. Temporal networks incorporate consecutive temporal effects among symptoms by representing arrowhead edges pointing from one node to the other (or to itself in case of self-loops) when the first predicts the second in the next window of measurement. Edges are also weighted according to the regression parameters [123,124]. Temporal networks are commonly estimated as lag-1 Vector-Autoregression (VAR) models [125], which consist of a set of regression equations on the given variables; these are all treated as endogenous, so they act as both outcome and predicted variables. The relations assessed through this method can also be interpreted in terms of Granger causality [126] by stating that, whenever an arrow from a timevarying symptom X to another time-varying symptom Y is found, then X Granger-causes Y, meaning that predictions of the value of Y based on its own past values and on the past values of X are better than predictions of Y based only on Y's own past values. Clearly, Granger causality should not be erroneously understood as pure causality. Rather, Granger causality gives evidence of temporal prediction and thus it can be potentially indicative of causality in the sense that, although the existence of a causal relationship would imply the observation of a temporal prediction, the opposite is not true, since temporal links may also arise for other reasons. In addition, some temporal predictions can also be missed because of lack of statistical power or insufficient sized lag interval [124].
The residuals of the temporal VAR model are used to compute the so-called contemporaneous network, which can be estimated as a GGM model, with edges representing the partial correlation obtained after controlling for both temporal effects and all other variables in the same window of measurement [124]. Together, the modeling framework including the estimation of both temporal and contemporaneous networks from a given dataset is called graphical VAR or GVAR [127]. It can be computed with the help of the R package graphicalVAR [128], which allows for both regularized and unregularized estimations [123].
When time-series of multiple subjects are available, it is possible to gain more insight into the network structure at the group-level by applying the multilevel-VAR model. For a given population, the average network parameters are called "fixed effects", whereas the person-specific deviations from these fixed effects are called "random effects". Random effects can be used to estimate intraindividual networks (temporal and contemporaneous) and to investigate interindividual differences [129]. Fixed effects can be instead used to uncover information about average intraindividual effects. More generally, if a multivariate normal is assumed for all parameters, then estimating the GVAR model on longitudinal data allows for the decomposition of the variance into three different structures, namely temporal networks, contemporaneous networks and between-subjects networks [123], where the latter is a GGM that allows for the examination of between-mean relationships for all individuals of the dataset. Estimation methods for multilevel-VAR models have been implemented in the R package mlVAR [128], which in particular permits to choose among different estimation procedures, such as sequential univariate multi-level estimation, multivariate Bayesian estimation, and fixed effects estimation.
In the context of the studies about EDs, Levinson et al.
[57] first conducted a pilot study in which they collected longitudinal data from N = 66 participants by asking them to complete an Ecological Momentary Assessment (EMA) survey [130,131] on ED cognitions and behaviors across one week. Using the graphicalVAR and mlVAR packages, they then estimated intraindividual networks to identify which symptoms maintain EDs within each individual as well as group-level networks, namely temporal, contemporaneous and between-subject networks. Later, they repeated an analogous study on a different dataset composed of longitudinal data of N = 1272 participants with the additional goal of comparing adolescents and adults network structures [58]. More recently, another study was conducted to exclusively estimate the individual networks of N = 34 participants with the aim of identifying and discussing a range of target symptoms for personalized ED treatment [59].

Approaches and tools for network description
The analytical phase that follows the estimation of the network structure from data is designated to the investigation and interpretation of specific characteristics exhibited by either single nodes or groups of nodes jointly. Although a first visual inspection can give some general clues, many numerical methods can be employed to gain deeper insights.
Centrality measures in GGMs. Depending on the type of network estimated, different measures can be used to investigate the role of each symptom in the network. In the case of undirected weighted networks, Opsahl et al. [132] proposed a generalization to weighted networks (see Table 2) of the most common measures of node centrality that were originally designed by Freeman [133] for binary networks, namely: degree, strength, closeness, and betweenness centrality.
However, all the measures above only consider weights in absolute value with the consequence that two nodes may have same centrality but opposing effects on the rest of the network. For example, an increase in Node A can cause an increase in Node B (positive influence) Table 2. Definition and interpretation of common centrality measures.

Centrality Measure Formulation Interpretation
Degree: number of edges connected to a node C D ðvÞ ¼ P u2V fvg e vu where e vu is an edge between u and v Higher values indicate higher centrality of that node in the network, that is, a central node is one with many connections. It is useful mainly in unweighted graphs.

Strength (S):
sum of the absolute weights (e.g., partial correlation coefficients) of the edges connected to a node C S ðvÞ ¼ P u2V fvg jw vu j where |w vu | is the absolute weight between u and v It represents the likelihood that activation of a certain symptom will induce the activation of other direct symptoms [5]. Just like the degree centrality, strength only accounts only for paths of unitary length.

Closeness (C):
inverse of the sum of the (minimum) distance of a node to all the other nodes in the network. It can be normalized as the average length of the weighted shortest paths to all the other nodes It indicates the index of expected time until arrival of something flowing through the network. In simple words, a node with high closeness is one that is close, on average, to other nodes. Hence, nodes with low closeness centrality are likely to be sooner influenced by changes in the network.
Betweenness (B): number of the geodesics between any two other nodes that pass through a given node C B v ð Þ ¼ g uz ðvÞ g uz , where g uz is the number of binary shortest paths between two nodes, and g uz (v) is the number of those paths that go through node v This measure plays an important role in the assessment of comorbidities, since symptoms with high betweenness centrality usually serve as bridge symptoms; when activated, they are likely to spread to both syndromic clusters [5]. In other words, a high-betweenness node is one that acts as a bridge in the communication with other nodes. It exhibits the same performance of strength when the network only contains positive weights, whereas it outperforms strength as the number of negative weights increases. It has been proven that EI is a better predictor of declines in the severity of symptoms over time. It is particularly useful when the aim is to identify target symptoms for therapeutic deactivation since these can only be identified through negative correlations.
This table describes the centrality measures that are most commonly used in the analysis of undirected weighted networks, as formalized by Opsahl et al. [132]. The last row describes the (one-step) expected influence centrality [134]. For each measure, its definition, mathematical formulation and interpretation are given. with the same strength that an increase in Node C causes a decrease in Node D (negative influence). In this case, Node A and C will have the same centrality index but an opposite effect. Moreover, a node with a similar number of strong positive and negative edges may have little to no overall network activation impact, that is, it might be highly central without being highly influential. Hence, a new centrality metric called expected influence (EI) has been introduced to consider both negative and positive edges (see Table 2, last row; [134]).
As argued by Bringmann et al. [135], betweenness and closeness centrality do not seem to be especially suitable as measures of node importance. Hence, results about these metrics should be interpreted with care. As an additional proof about this fact, many articles reported betweenness and closeness centrality as not satisfying the minimum stability results and did not include their estimation in the network description [31,36,46,64,74,79,82]. Others decided instead to take these measures into account but did not assess their stability [45,69].
Throughout the large number of studies reviewed, few symptoms appeared among the most central ones across heterogeneous samples and estimation techniques. Among these: shape and weight overvaluation, body dissatisfaction, drive for thinness, ineffectiveness (i.e., feeling of inadequacy, insecurity, worthlessness and having no control over one's life), and lack of interoceptive awareness (i.e., ability to identify, access, understand, and respond properly to the patterns of internal signal). For a complete list refer to Table 3.
Diverging from the other studies, 3 papers also assessed the importance of nodes using key players analysis. Specifically, they identified the nodes that, when removed, would result in a maximally disconnected residual network. In other words, a treatment targeting these key nodes is expected to slow the cascade of symptoms through the ED network. Analyzing the symptom network of a sample of participants with mixed ED diagnoses, Forbush et al. [45] identified its key players to be: people encouraged me to eat more, need to exercise nearly every day, and try to avoid foods with high calorie content. Perko et al. [71] employed this metric to assess sex differences in ED symptoms, but they found that the key players were, in both cases, items related restricting dieting and binge eating. More recently, Liebman et al.
[60] explored the associations between posttraumatic stress disorder and ED symptom in presence of at least one experience of childhood abuse and found that the key players of the network were: purging, negative alterations in cognitions and mood, and hyperarousal.
Interpretation and measures of importance in Bayesian networks. When interpreting a Bayesian network visualized as a DAG, node importance cannot be evaluated through the above-mentioned centrality measures for undirected graphs. Rather, Rodgers et al. [73]  This table summarizes the symptoms (left column) that were most often identified as central in the psychopathology networks built from psychometric data concerning either specific or nonspecific ED symptoms. The central column indicates the metric used (S = strength, EI = expected influence, C = closeness, and B = betweenness).
The column on the right contains the list of papers that find the symptom as a result of their analysis. https://doi.org/10.1371/journal.pone.0276341.t003 proposed to assess this property based on the relative contribution of each symptom to the overall model fit of the Bayesian network. In particular, they evaluated the scaled contribution of each symptom to BIC for three different DAGs: one estimated from the full sample, one from the subsample of participants who experience childhood abuse in addition to having received an ED diagnosis, and one for the subsample of those that only suffer from an ED. The result was that in the first and third group the most important symptoms were overvaluation of shape and weight, depressed mood and eating large amounts of food, whereas in the second group a different pattern of relationships among symptoms emerged, with depressed mood, dietary restraint, self-induced vomiting, and driven exercise being the most important driving symptoms of the disorder. The dissimilarity found was consistent with the concept of maltreated ecophenotype [136], according to which distinct subtypes of a given disorder may be developed as a consequence of abuse and trauma. This hypothesis has been also confirmed by other experimental studies on different psychiatric disorders including EDs [137].

Understanding comorbidities between EDs and other psychopathologies.
With the specific aim of identifying bridge symptoms, four network statistics have been developed and implemented in the R package networktools; importantly, since they are not specific to the type of network estimated, they can be readily applied to intraindividual networks, group-level networks, and other networks different from the psychopathology ones [138]. They are defined as follows: • Bridge strength (BS), that is, the sum of the absolute value of every edge which connects a given symptom to symptoms belonging to other disorders. Bridge in-strength and bridge outstrength can be analogously defined in directed networks by considering only the subset of edges that are, respectively, directed toward the node or issued from a node.
• Bridge expected influence (BEI), which is defined just like the bridge strength but without taking the absolute value. In directed networks, only edges issuing from a node are summed.
• Bridge Closeness (BC), that is, the average distance from a given node to all nodes outside of its own disorder.
• Bridge betweenness (BB), that is, the number of times a given node lies on the shortest path between any two nodes belonging to two distinct disorders (including the one it belongs to).
Refer to Table 4 for the list of the bridge symptoms between EDs and other psychopathologies that have been assessed through the metrics defined above in some of the papers under review.
Assessing the role of the external field in the development and maintenance of Eds. Interactions between ED specific symptoms and various elements of the external field have also been investigated. Few studies used the bridge centrality measures to accomplish this task. Monteleone et al. [67,137] chose instead to adopt a different approach to explore the psychological pathways through which childhood maltreatment (CM) experiences promote the development of ED core symptoms. Namely, they first selected few variables from items and scores of different psychometric questionnaires in order to build symptom networks for each ED diagnosis, then the shortest path between any CM and ED node was computed using Dijkstra's algorithm, and finally they used mediation analysis to confirm the mediation role of the symptoms included in the shortest pathways from CM to ED specific symptoms. All the other studies considered all variables as a single community and chose to determine the core symptoms using the centrality measures in their classical form with the specific aim of verifying whether EDs are mainly maintained by ED specific symptoms or rather by nonspecific ones. The results achieved throughout the papers under review are summarized in Table 5.
Longitudinal studies. All longitudinal studies that used the graphicalVAR and mlVAR for network estimation also computed node centrality through the following measures: strength for between-subject and contemporaneous networks, in-strength and out-strength for temporal networks. In the latter case, that specific choice of metrics was taken to underlie the different impact of incoming against outgoing edges. More precisely, In-strength is given by the sum of links pointing towards the node and indicates how much information that node is receiving from directly connected nodes. Instead, out-strength is given by the sum of links pointing from one node to all the other and indicates how much information that node is sending to directly connected nodes. This distinction has the precious advantage of suggesting which nodes (i.e., those with highest out-strength), have the potentiality of having downstream effects on other symptoms if treated [59].
As for the group-level temporal networks, the symptoms with highest in-strength centrality were desire to be thin, body checking [57], fasting, fear of weight gain and feeling fat [58,59]. Those with highest out-strength centrality were exercise, binge eating [57], feeling fat and fear of weight gain [58]. The strongest symptoms in the contemporaneous and between-subject group-level networks were desire to be thin [57], and feeling fat [58]. Among the individual Table 4. Bridge symptoms between eating disorders and other psychopathologies.

Comorbidity Bridge Symptoms References
Trait anxiety Avoidance of social eating (ED) Lacking self-confidence (anxiety) [48] Social anxiety disorders Social eating and drinking [99] Nervousness focused on appearance [33,99] Concern over being judged [76] Autism spectrum disorder Poor self-confidence Concerns over eating around others Concerns over others seeing one's body  . Pre-to post-treatment studies. Few studies have been conducted to compare pre-and post-treatment network models. In particular, they all estimate and compare GGM networks at each time point (mainly at admission and discharge, in few cases also at follow-up) and few studies also assess the prognostic value of the most central nodes at baseline through linear or logistic regression. One of them [44] first computed zero-order correlations between each symptom at baseline and three outcome measures (i.e., treatment recovery status, clinical impairment, and posttreatment BMI) and then tested whether these prognostic values were associated with the expected influence of symptoms at baseline via linear regression. The results of this study show that EI values remained constant across all time points, with the strongest nodes being feeling fat, fear of weight gain, discomfort seeing one's own body, dissatisfaction with weight, and a strong desire to lose weight. The authors also found that more severe symptom levels were associated with a lower possibility of recovery, higher clinical impairment and higher BMI. Finally, they observed that centrality of symptoms at baseline was significantly associated with prognostic values for both recovery status and clinical impairment. Similarly, Hagan et al. [51] found that pretreatment central symptoms in adolescents with AN significantly predicted early response but did not predict remission. A third study [69] also found that the most central symptoms, namely interoceptive awareness and ineffectiveness, did not change after treatment. The authors also used multiple regression to test whether the identified core symptoms predicted outcomes (BMI, depression, and anxiety) at discharge. Their hypothesis was confirmed for BMI and depression but not for anxiety. Finally, Brown et al.
[34] showed that stronger desire to lose weight at admission was associated with lower likelihood of achieving remission at discharge. Many other studies only estimated and compared the centrality of symptoms at different time points. A change in the role of certain symptoms was found, with the strongest at baseline being fearing weight gain, dietary rules [36], eating disorder-related impairment, self-esteem and shape concern [52], and the strongest at discharge being dietary restraint [36,52], food preoccupation, feelings of fatness and discomfort seeing its own body [36]. Smith et al. [79] only computed centrality measures of the admission network, finding that the strongest symptoms were: shape and weight-related concentration difficulties, general concentration difficulties, guilt about eating, desire to lose weight, and nervousness.
Finally, the study of Monteleone et al. [66], explicitly designed to assess the clinical change promoted by TAU enhanced by RecoveryMANTRA compared to TAU alone, did not compute the centrality of symptoms as it was out of scope, but it focused instead on the differential improvement in symptom strengths between the two samples at each time point (from baseline to 12-months follow-up). In particular, they found that RecoveryMANTRA was associated with a direct effect on few symptoms (i.e., anxiety, shape concern and restraint but not on motivation, stress and depression as hypothesized) only at the end of the intervention.but not at follow-up. Furthermore, they computed the predictability of symptoms in terms of explained variance [161] and found an increase in predictability of the network from baseline to 12 months follow-up, suggesting that treatment indeed promoted changes in the associations between symptoms.

Network stability analysis
Network stability analysis has been conducted in most of the papers under review, especially in those published after 2018, when Epskamp and his colleagues Borsboom and Fried [24] proposed precise guidelines for this task. In particular, they suggested specific methods to assess the robustness of the model at three distinct levels: accuracy of the estimated edge weights, stability of the order of centrality indices, and difference between specific edge weights or centrality indices.
Accuracy of edge weights. The accuracy of edge weights can be evaluated by constructing intervals that reflect the sensitivity of edge weight estimates to sampling error, such as confidence intervals (CIs), credibility intervals and bootstrapped intervals [18]. When handling ordinal data as in the current study, it has been suggested to derive the (1-α) CIs via nonparametric bootstrap at a given confidence level, for example, α = 0.05 [24]. The narrower the CIs, the more likely is that the estimated edge weight is close to its real value, since in 95% of the cases such a CI will contain the true value of the parameter. Although large CIs can result in a poor accuracy for centrality indices, they do not influence the presence of an edge, nor its sign, as these properties are already assessed by LASSO. Moreover, it should be noticed that since we use regularization to estimate the network structure, all edge weight estimates are biased towards zero and, consequently, all sampling distributions are biased towards zero as well, just like CIs are not centered around the true unbiased parameter value anymore. This implies that, when interpreting the quantiles of the bootstrapped sampling distribution, if they overlap with zero it could be that the corresponding CI does not overlap with zero, while if they do not overlap with zero, then also the corresponding CI does not overlap with zero. In other words, CIs should not be interpreted as significance tests to zero, but only to show the accuracy of edge weight estimates by evaluating the size of CIs and to compare edges to one another by checking if the corresponding CIs do not overlap; if so, then we can conclude that they significantly differ at the given significance level, in the other case, then we cannot infer the contrary since they might still significantly differ [24].
Stability of centrality indices. As noticed by Epskamp and colleagues [24], the same bootstrap technique cannot be used to construct true CIs around the centrality indices. As an alternative, they proposed to investigate the stability of the order of centrality indices based on subsets of the data, that is, by comparing the order of centrality indices after re-estimating the network with fewer cases or nodes. When this is done for various proportions of cases to drop, then one can also assess the correlation between the original centrality indices and those obtained from subsets. If this correlation keeps stable after dropping a considerable proportion of the cases (e.g., 10%), then the interpretations of centralities can be considered stable. This method has been called case-dropping subset bootstrap. Exploiting this technique, a quantification of the stability of a centrality index can be given by the correlation stability coefficient (CScoefficient), a measure representing the maximum proportion of cases that can be dropped, such that the correlation between original centrality indices and that of networks built on subsets with 95% probability is still equal or higher than a given value which is set to 0.7 by default. As a cutoff score for interpreting an estimated centrality index as stable, Epskamp and colleagues [24] suggested that the CS-coefficient value should be above 0.5 or in any case not below 0.25. As already mentioned, this cutoff was not reached by the CS-coefficient of closeness and betweenness centrality in almost all cases. On the other side, strength and expected influence always attained pretty good values, usually significantly above the threshold.
Methodological differences in edge weights and centralities. The last technique proposed by Epskamp [24] is the bootstrapped difference test, which is a null-hypothesis test used to assess whether the edge weights or centralities differ from one another. This is accomplished by considering the difference between the two bootstrap values of edge weights or centrality indices under study and constructing a bootstrapped CI around those difference scores. If zero is not in the bootstrapped CI, then the null hypothesis is rejected, meaning that there is evidence that two values differ from one-another. On the other hand, it should be noticed that not rejecting the null-hypothesis is not sufficient for inferring that the null-hypothesis is true. Finally, Epskamp et al. [24] emphasized and warned about the fact that this technique does not take into account any correction for multiple testing, since applying Bonferroni correction is not feasible in practice in this context. As a consequence, as the number of performed significance tests increases, the probability of finding several significant results purely by chance (Type 1 error) also increases.
Other approaches for stability estimation. The above-mentioned stability techniques do not apply to Bayesian networks. As an alternative, Rodgers et al. [73] quantified the arc strength, that is, the degree of confidence it is possible to have when interpreting specific pathways, through a bootstrapping procedure introduced by Friedman et al. [162,163] and implemented in the bnlearn R package. The general idea behind this procedure is that we should be more confident on features that would still be induced when we perturb the data. Therefore, in a nonparametric bootstrap setting, one first generates perturbations by re-sampling with replacement from the given dataset and then examines how many of the perturbed structures exhibit the feature under study that, in this specific case, corresponds to the presence and direction of each edge. Their relative frequency across the bootstrapped samples gives an estimation of each arc strength [164]. Rodgers et al. [73] reported an average frequency of 84% concerning the presence of edges correctly identified across bootstrapped samples, and an average frequency of 63% for their direction. The arc fear of gaining weight ! cognitive restraint had the highest presence frequency of 99%, while the arc preoccupation with eating and body image ! depressed mood had the highest direction frequency of 88%.

Common methods for network comparison
Many of the studies under review aimed at comparing network structures across different populations. A specific tool to accomplish this task, namely the Network Comparison Test (NCT; [17]) has been devised and implemented in the R package NetworkComparisonTest [165]. Compared to other statistical tests, the NCT overcomes the usual assumption of normality and possible improper null hypothesis that are unsuitable for the regularized parameters that results after GGM estimation with LASSO regularization, the most typical method employed in the network approach to psychopathology.
The NCT is a 2-tailed permutation test in which the difference between two groups is calculated repeatedly for randomly regrouped individuals. It consists of three steps: first, the network structure is estimated for both groups using the original data and the metric of interest is calculated; second, data is permuted iteratively to rearrange group memberships, networks are then re-estimated, and metrics are calculated based on permuted data to create a reference distribution; finally, the significance of the observed test statistic is evaluated by comparing it to the reference distribution. In particular, the p-value equals the proportion of test statistics that are at least as extreme as the observed test statistic. Thus, the null hypothesis that the two networks under comparison are the same can be rejected if the latter is larger than expected (i.e., p-value < 0.05).
As for the test statistics that can be used to compare networks, Van Borkulo et al. [165] proposed three metrics that represent both global and local differences, namely invariance of network structure and global strength for the former case, and invariance of edge strength for the latter (see Table 6). Table 6. Network comparison.

Metric
Definition Interpretation

Invariant network structure (Mtest)
It evaluates the null hypothesis that all edges are equal. Specifically, the Chebyshev norm of the vector containing all differences of corresponding edge weights in the two networks is calculated.
If this is higher than some threshold d, then at least one of the differences is larger than d. On the contrary, if the maximum of all differences is not significant, then none of the differences is significant, meaning that the null hypothesis cannot be rejected.

Invariant edge strength (E-test)
It evaluates the null hypothesis that a pair of corresponding edges have equal weight It represents the likelihood that activation of a certain symptom will induce the activation of other direct symptoms [7]. Just like the degree centrality, strength only accounts only for paths of unitary length.

Invariant global strength (S-test)
It evaluates the null hypothesis that the overall level of connectivity (i.e., global strength) is the same across subpopulations.
The above test statistics have already been used in several studies about EDs. Significant results concerning the invariant network structure have been found in various comparisons, in particular: clinical versus nonclinical subsamples of networks assessing comorbidity between EDs and different features of social anxiety disorder [99]; admission against discharge network [36]; adolescents versus adults with AN and BN, with the exception of the comparison between adolescents with BN and adults with BN [77].
Moreover, significant results have been obtained when computing the invariant global strength for many pairs of networks, among those: groups of individuals with low versus high levels of overvaluation of shape and weight, with the latter resulting in higher connectivity [43]; clinical versus nonclinical samples, with a higher density in the former [82]; groups of individuals split based on the median value of the EDE-Q global scores at admission and discharge, with denser networks at admission predicting less change in ED symptomatology during treatment [79]; pre-to post treatment networks, with increased connectivity in the latter [52]; admission against discharge network, with decreased connectivity in the latter [36]; adolescents with AN versus adolescents with BN, adolescents with AN and adults with BN, both cases with higher global strength in the AN sample [77]; men with core ED symptoms versus men without them [47]; men versus women [71]; and across developmental stages [38].
Finally, having found a significant variation in the network structures, Calugi et al.
[36] also tested the change in weight of links from admission to discharge. He identified few connections that were stronger at baseline than at discharge, namely feelings of fatness and desiring weight loss, BMI and vomiting to control shape or weight, and others whose relationship grew at discharge, namely food preoccupation and desiring weight loss, fear of losing control overeating and vomiting to control shape or weight, feelings of fatness and dissatisfaction with weight and shape. Schlegl et al. [77] reported instead the percentage of edges that were significantly different in each pair of networks which resulted to have significant invariant network structure statistics. The values found ranged from 2.56% for the adolescents with AN versus adults with AN comparison to 10% for the adolescents with AN versus adults with BN comparison.
One study assessing the network differences from admission to discharge, although not finding any significant changes in the global strength, reported a significant effect of time on symptom severity, indicating decreases in ED, depression, and anxiety symptoms, with medium to large effect sizes [79]. These results were assessed through repeated measures multivariate analysis of variance (RM MANOVA), which is a statistical technique to determine the degree to which multiple dependent variables (e.g., total scores of psychometric assessment tests) vary across time points.

Discussion
Throughout this work, we first recalled the recent and promising field of psychometric network analysis and we then outlined a comprehensive review of its methods and state-of-theart best practices applied to the processing and study of psychometric data related to EDs. Most of the reviewed studies were based on cross-sectional data retrieved from structured psychometric questionnaires administered to subpopulations of individuals diagnosed with an ED disorder. The most widely used questionnaires to assess ED specific symptoms were EDE-Q and EDI-2, which also accounts for general psychological factors. Other questionnaires widely used to assess nonspecific ED symptoms were SCL-90 and BDI. Only in a few cases, mostly conducted by , the analysis was carried out on panel data collected by means of EMA methods or repeated administration of one or more specific questionnaires to the same sample at different time points. These were the only studies that were able to answer research questions about the dynamic of symptom networks and the intraindividual network structure. Due to the limited number of publications and the considerable clinical implications, our conclusions suggest that future research should give this issue more focus.
With regard to the general characteristics of the participants, the most blatant peculiarity is surely the clear prevalence of female patients. Only one study [47] reported a greater percentage of male participants that, however, were not recruited in a clinical setting.
Almost all studies included in their analysis a node selection step to eliminate redundant items and obtain more accurate results. Those relying on cross-sectional data mainly used a Gaussian Graphical Model with LASSO regularization technique to estimate an undirected symptom network. Only one study [45] was found to employ non-regularized methods, such as association and concentration graphs, and only one [73] produced a directed Bayesian network. The popularity of GGM in recent psychological research is well known [166], but concerns about its generalizability and replicability by using LASSO regularization have been raised due to the relatively small size of psychological datasets [167] and to the type of variables chosen for the analysis. Therefore, a major attention from researchers should be paid to verify that data meets the assumptions of GGM and, if not, choose in favor of more suitable techniques, such as MGM in the case of variables following different distributions or non-regularized methods in the case of datasets with number of parameters much smaller than the number of observations (see, for example, [116,168]).
The parallel estimation of symptom networks on different subsamples was achieved in few cases through the FGL technique [47,62,77,78]. Finally, different studies based on panel data were also found to employ mlVAR to estimate between-subject networks and graphicalVAR to estimate temporal and contemporaneous networks [57-59].
The network description step was focused on the identification of the core symptoms and of the bridge symptoms in case of research questions concerning comorbidities. As for the first point, our survey suggest that the most used network centrality measures are strength and expected influence: the application of these indices highlighted that both specific and nonspecific ED symptoms are central for the development and maintenance of ED psychopathology, in particular shape and weight overvaluation, body dissatisfaction, fear of weight gain, drive for thinness, ineffectiveness, lack of interoceptive awareness, and social insecurity. Similarly, the most used network measures for the identification of bridge symptoms are bridge strength and bridge expected influence. These indices revealed that avoidance of social eating and lack of self-confidence were found to bridge ED with anxiety disorders, whereas feelings of worthlessness, having a negative reaction to wanting to weigh oneself weekly and not wanting to eat in social situations were found to bridge ED with depression. When exploring the relationship with the external field, emotional abuse during childhood has been identified as a highly influential variable for the development of any ED [65,67]. In all longitudinal studies ED specific symptoms like overvaluation of weight and shape and fear of weight gain reported the highest in-strength and out-strength centrality. Finally, the pre-to post-treatment comparison revealed that central symptoms remained constant across all time points, with more severe symptom levels associated with a lower possibility of recovery, higher clinical impairment [44,69]. A change in the role of certain symptoms was found instead by Hilbert et al. Network stability analysis has been conducted in almost all papers under review (explicitly reported in 50 out of 57). In particular, almost all authors employed the bootstrap methods previously described to compute the accuracy of the estimated edge weights, the stability of the order of centrality indices, and difference between specific edge weights or centrality indices.
One common finding is that strength and expected influence generally performs much better, i.e., are more stable, than closeness and betweenness centrality and can thus be assumed to be more reliable indices of the centrality of symptoms. Formally, a major difference in the computation of these measures is that strength and expected influence consider, for each node, only its adjacent nodes, whereas closeness and betweenness are computed based on paths of arbitrary length. Therefore, a possible explanation for the low stability of the latter measures is that the patterns of influence among distant symptoms and psychological traits are extremely variable in terms of population, meaning that small variations in the sample composition produce very different patterns. On the contrary, the direct effects of each symptom on any other in the network do not change significantly by taking an arbitrarily small subsample of the original observations. For a clinical perspective, the detection of symptoms and psychological variables with high closeness and betweenness centrality could be extremely meaningful, since it would help them target those factors responsible for the development of comorbidities and thus avoid a worsening in the severity of the clinical picture of the patient. Because of the inadequate results obtained from cross-sample data in this context, further research exploring the features of individual symptom networks is recommendable.
The NCT was employed to reveal differences in the symptom networks of samples with different characteristics. In particular, our study reported many cases in which similarities in network structures were found, although with different levels of connectivity. An important note should be mentioned here about the global strength of pre-and posttreatment networks. According to the network theory of psychopathology, effective treatment should lead to a decrease in network connectivity and its self-sustaining character, but studies assessing this assumption reported contrasting results that either supported it [36] or not [52], suggesting that further research on the predictive value of network variables in the therapeutic outcome is needed.
As regards the software tools available, our data highlight the fact that the state-of-the-art procedures are all based on a collection of R packages specifically designed for the network analysis of psychometric data. To our knowledge, no study employed other software commonly used in network science, such as Cytoscape [169], Pajek [170], or Python libraries, whose integration might bring significant contributions to the field of psychometric network analysis (see, for example, [171]). In particular, the use of free, general purpose and distinctly "user friendly" tools such as Cytoscape could bring clinicians closer to the network approach, and network experts to the field of psychopathologies, and facilitate a broader, more aware and interdisciplinary use of these potentially very effective methods.

Conclusion
The aim of this study was to present characteristics, usage and output of the main networkbased methods applied to EDs psychometric data, including their similarities and differences. We here presented the largest and most comprehensive review about psychometric network analysis methods up to date, taking into consideration 57 works published from 2016 (our queries did not retrieve any psychometric network analysis paper published before that date) to early 2022, and allowing to fill some of the gaps present and recognized in previous reviews in terms of article coverage and specific focus on network-based methodologies.
One major contribution of this article that was missing in the previous reviews is in the inclusion of studies based on panel data. This kind of data is fundamental to estimate temporal and intraindividual networks, which both might lead to significant clinical findings, such as the psychological dynamics responsible for the oscillation among different EDs during the lifetime of a patient, or the specific psychological variables that, due to the unique personal history of each patient, are involved in the maintenance and development of comorbidities and should be thus targeted for clinical intervention.
In conclusion, what emerged from our study is that there is a general agreement in the methodologies to be used for psychometric network analysis that depict a coherent image describing a strong symptom network where both specific and nonspecific ED symptoms are central for the development and maintenance of ED psychopathology. However, this image is still incomplete. Firstly, because the results of the present systematic review suggest that methodological developments are still needed to model both temporal and intraindividual symptoms and to integrate different input information into one single network structure. A possible direction to accomplish these tasks might be the multilayer network approach [172], according to which a complex system can be modeled as a network of networks, in other words, as a set of multiple layers with connections between and within them [173] In the case of mental disorders, one can think of extending the study of symptom networks with other entities (such as genetic factors, brain structure and functional connectivity, environmental factors) as layers in a multilayer network. An attempt to implement this approach has already been proposed to integrate multiple levels of personality, namely neural and psychological constituents [174], but an application to psychopathologies is also advised [175].
Furthermore, even though the studies here collected and reviewed suggest that network methods can be useful and effective in clinical practice, to the best of our knowledge the great majority of such studies have not undergone experimental verification yet.